A Modification of the Landau-Vishkin Algorithm Computing Longest Common Extensions via Suffix Arrays

نویسندگان

  • Rodrigo de Castro Miranda
  • Mauricio Ayala-Rincón
چکیده

Approximate string matching is an essential problem in many areas related to Computer Science including biological sequence processing. The standard solution of this problem is an O(mn) running time and space dynamic programming algorithm for two strings of length m and n. Landau and Vishkin developed an algorithm which uses suffix trees for accelerating the computation along the dynamic programming table and reaching space and running time in O(nk), where n > m and k is the maximum number of admissible differences. Suffix trees are used for preprocessing the sequences allowing an O(1) running time computation of the longest common extensions between substrings. One of the practical drawbacks of the Landau-Vishkin algorithm is the excessive use of space inherent to the use of suffix trees. In fact, although suffix trees can be handled in linear space their construction and manipulation implies high multiplying factors to this linear behavior. We present a variation of the Landau-Vishkin algorithm which instead of suffix trees uses suffix arrays for computing the longest common extensions, thereby improving actual space usage.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modifications of the Landau-Vishkin Algorithm Computing Longest Common Extensions via Suffix Arrays and Efficient RMQ computations

Approximate string matching is an important problem in Computer Science. The standard solution for this problem is an O(mn) running time and space dynamic programming algorithm for two strings of length m and n. Landau and Vishkin developed an algorithm which uses suffix trees for accelerating the computation along the dynamic programming table and reaching space and running time in O(nk), wher...

متن کامل

Computing Longest Common Substrings Via Suffix Arrays

Given a set of N strings A = {α1, . . . , αN} of total length n over alphabet Σ one may ask to find, for a fixed integer K, 2 ≤ K ≤ N , the longest substring β that appears in at least K strings in A. It is known that this problem can be solved in O(n) time with the help of suffix trees. However, the resulting algorithm is rather complicated. Also, its running time and memory consumption may de...

متن کامل

Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications

We present a linear-time algorithm to compute the longest common prefix information in suffix arrays. As two applications of our algorithm, we show that our algorithm is crucial to the effective use of block-sorting compression, and we present a linear-time algorithm to simulate the bottom-up traversal of a suffix tree with a suffix array combined with the longest common prefix information.

متن کامل

Suffix Trees and Suffix Arrays

Iowa State University 1.1 Basic Definitions and Properties . . . . . . . . . . . . . . . . . . . . 1-1 1.2 Linear Time Construction Algorithms . . . . . . . . . . . . . 1-4 Suffix Trees vs. Suffix Arrays • Linear Time Construction of Suffix Trees • Linear Time Construction of Suffix Arrays • Space Issues 1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...

متن کامل

Lightweight Parameterized Suffix Array Construction

We present a first algorithm for direct construction of parameterized suffix arrays and parameterized longest common prefix arrays for non-binary strings. Experimental results show that our algorithm is much faster than näıve methods.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005